Peering into the Black Box: Investigating the efficacy a popular social media monitoring tool using AI

By: Ari Sen

Introduction

Across the country millions of students are returning to campuses or setting foot on them for the very first time. This new period in students’ lives brings lots of emotions — excitement and anxiety, fear and joy, connection and loneliness.

For the past decade many have turned to social media accounts to express these feelings to their circle of friends, family and peers. But this trusted group may not be the only ones watching — across the country dozens of colleges have purchased a technology called Social Sentinel, in what they say is an attempt to keep the worst from happening.

Background

Social Sentinel is a service sold to schools which scans social media messages which tries to detect threats of violence or self-harm. The company has said in the past that it scans more than a billion posts on social media every day against more than 450,000 words and phrases in its “Language of Harm.” The service started its life as Campus Sentinel, an app which tracked crime statistics for campuses across the country. The app was created by Gary Margolis and Steven Healy, two former campus police chiefs who formed a security consulting company together in 2008. Sometime in late 2014, the two men shifted the app's purpose away from crime stats and towards social media monitoring, rebranding it “Social Sentinel.” So far as we can tell from the company's patents, documents I've obtained and news articles, Social Sentinel’s system consists of two main pieces:
  1. An AI system that detects potentially threatening messages posted to social media
  2. A method which associates a flagged
My reporting suggests that Social Sentinel has been used by at least 37 colleges in the past six years. Some of the country’s largest and most well-known colleges have used Social Sentinel, including UNC-Chapel Hill, the University of Virginia, Michigan State University, MIT and Arizona State University.











This number is also very likely to be an undercount — an email we obtained from one school, says the service is used by "hundreds" of colleges and universities in 36 states across the country. Still, by comparison, this number is small when compared to the "thousands" of K12 schools where the same technology is used.

Perhaps because of their comparatively lower numbers, very little attention has been paid to college campuses that use this technology. But, in my view, this is a mistake I hope to address with this story.

Unlike at the K12 level, where alerts are often sent to mental health counselors or school administrators, at the college level Social Sentinel’s alerts are typically sent to campus police officers. This is interesting for two reasons.

The first is, unlike regular police which answer to a chief which answers to a mayor or city council, campus police are essentially totally accountable to the population they serve. A college student usually gets no vote on who the chancellor or president of their university will be, nor can they usually argue against a police action before it takes place.

The second is that campus police were, quite literally, created to suppress student activism. Although the first campus police were established in 1894 at Yale, most other universities didn’t follow suit until the late 1960s and early 1970s; According to historians, these departments were largely formed to quash student protests against the ongoing war in Vietnam.

Reporting Hypotheses

My hypotheses are as follows:

  1. Social Sentinel is used by campus police not just for its stated purpose of preventing suicides and shootings, but also for suppressing protests and activism
  2. Social Sentinel is not an effective tool for preventing suicides and shootings on campuses, or at very least is not as effective as the company claims that it is

In this project I will mainly be focusing on the second hypothesis.

Data

My data is a collection of nearly 1236 tweets, nearly 400 of which were flagged by Social Sentinel as potential threats. The flagged tweets were gathered by Peter Aldhous and Lam Vo for their 2019 story Your Dumb Tweets Are Getting Flagged To People Trying To Stop School Shootings. The unflagged tweets were scraped from Twitter using the Twint library in Python. These tweets were gathered from the same users in the same time period as the flagged tweets, plus and minus a week for those with only one flagged tweet.

Method

To support my reporting hypotheses, I generated embeddings for every tweet using BERTweet and the clustered them using k-means, with a k of 2. I then compared these embeddings to the labels Social Sentinel assigned to the tweets and to my human-annotaed labels of whether the tweet was threatening or not.

To generate visualizations, I used TSNE to reduce the diminsionality of the BERTweet embeddings to two and plotted each on an x, y coordinate plane.

I colored the plots:

  1. Based on the cluster labels
  2. Based on my human annotated labels
  3. Based on Social Sentinel's labels

The plots are reproduced below:



Cluster Labels


My Labels


Social Sentinel's Labels





I also used topic modeling to investigate the salience of groups of words in the corpus:





Analysis

The results from the clustering suggest that neither my system, nor the method Social Sentinel is very accurate. The company's model did perform better on this metric though, scoring 0.696 vs my 0.574, when compared to the human annotated labels.

However, my method has far fewer false positives AKA much higher precision: 0.065 for Social Sentinel vs 0.310 for my system.

This should be a concerning finding for the company and any school that uses this technology, given that I built my system in only a few hours and had far less training data to work with for this task.

Why it matters

Social Sentinel has repeatedly claimed that they have significantly reduced their false positives:

  1. From January to June of 2018 the company claimed it reduced their FPs by 52%
  2. On October 4, 2018 they said they reduced their FPs by 80% and repeated the claim on January 4, 2019
  3. On May 3, 2019 Margolis said “our technology removes nearly 99% of the false positives”

My analysis suggests that claims are at best dubious and at worst outright fabrications. The system, as evalauated here, is thus likely to be a waste of both money and precious policing and mental health resources.

This reporting also raises the question: If it isn't threats of suicide and shootings they are surfacing, what is it that they are catching?

My reporting so far suggests that the answer may be protests and student activism.

Next Steps

  1. Parse the new set of tweets we recently obtained from Gulf Coast State College into a csv, and re-run these analyses
  2. Obtain more tweets and, when possible, try to build a real classifier
  3. Get more human annotators, including experts, and take the inter-annotator agreement as a comparison set
  4. Bring these findings to Social Sentinel and to the colleges to see what they say about it